Introduction
B-cell acute lymphoblastic leukemia (B-ALL) presents unique diagnostic challenges due to its complex cellular makeup. Although flow cytometry, especially with EuroFlow standards, offers detailed insights, it is still a time-consuming process susceptible to variations between observers. The integration of machine learning (ML) can enhance this workflow, ensuring consistent and accurate diagnostic outcomes. This study examines how ML can be applied to flow cytometry data for identifying measurable residual disease (MRD) in B-ALL.
Objectives
The objective of this project was to develop and validate a ML model for the accurate detection of MRD in B-ALL using standardized flow cytometry data from a real-world diagnostic unit in Spain.
Methodology
We collected 595 samples processed according to EuroFlow standards for MRD detection in B-ALL. These samples included two different antibody panels: the first panel targeted CD81, CD304+CD73, CD34, CD10, CD20, CD45, CD38, and CD19, while the second panel targeted CD81, CD66C+CD73, CD34, CD10, CD20, CD45, CD38, and CD19. All studies were conducted on follow-up samples analyzed for MRD detection or due to suspicion of relapse.
Preprocessing involved using the Bioconductor package flowAI to remove doublets, margins, and artifacts. A gating strategy was then applied to enrich the analysis for B cells, focusing on extracting predominantly positive events from each sample. To address dataset imbalances, we employed the Synthetic Minority Over-sampling Technique (SMOTE) in the training set.
For clustering, we used flowSOM to extract clusters and metaclusters from each tube, which were subsequently fed into a random forest classifier. The model was trained and cross-validated on the training set, followed by independent validation on the test set.
Results
The training set comprised 800 samples with a slightly unbalanced distribution of class labels (500 positive, 300 negative), due to the initial dataset lacking sufficient positive samples. The random forest model exhibited robust performance during the training phase, achieving an out-of-bag (OOB) area under the curve (AUC) of 99.44%, a precision-recall (PR) AUC of 99.05%, and a Brier score of 0.123 indicating high accuracy. The OOB G-mean was 0.96, with a misclassification rate of 3.12%. The confusion matrix indicated a class error of 2.90% for the negative class and 3.27% for the positive class.
The test set consisted of 92 samples. During the validation phase, the model maintained strong performance, with an AUC of 92.93%, a PR-AUC of 0.92, and a Brier score of 0.08. The G-mean for the test set was 0.83, with a misclassification rate of 10.87%. The confusion matrix for the test set showed a class error of 0% for the negative class and 30.3% for the positive class. Further analysis of the test set revealed that 56 samples were extracted due to suspicion of disease relapse, with a misclassification rate of 5%, while 82 samples were obtained for MRD detection, where the rate increased to 13.58%.
In addition to the robust performance metrics, the implementation of our ML model within the SmartCytoFlow platform has significantly streamlined the diagnostic workflow. SmartCytoFlow automates data preprocessing, gating, clustering, and classification, providing real-time diagnostic support. The integration has resulted in a substantial reduction in analysis time and improved diagnostic consistency.
Conclusion
Our study highlights the effectiveness of integrating ML with flow cytometry for monitoring B-ALL. This approach complements traditional diagnostic methods, providing consistent and accurate results. Further validation in diverse cohorts is needed to establish its broader applicability in routine practice.
Mosquera Orgueira:AstraZeneca: Consultancy, Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Abbvie: Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Pfizer: Consultancy; Roche: Consultancy; Takeda: Speakers Bureau; Incyte: Other; Novartis: Other; GSK: Consultancy; Biodigital THX: Current equity holder in private company; Janssen: Consultancy, Membership on an entity's Board of Directors or advisory committees, Speakers Bureau.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal